8 research outputs found

    Capture, Learning, and Synthesis of 3D Speaking Styles

    Full text link
    Audio-driven 3D facial animation has been widely explored, but achieving realistic, human-like performance is still unsolved. This is due to the lack of available 3D datasets, models, and standard evaluation metrics. To address this, we introduce a unique 4D face dataset with about 29 minutes of 4D scans captured at 60 fps and synchronized audio from 12 speakers. We then train a neural network on our dataset that factors identity from facial motion. The learned model, VOCA (Voice Operated Character Animation) takes any speech signal as input - even speech in languages other than English - and realistically animates a wide range of adult faces. Conditioning on subject labels during training allows the model to learn a variety of realistic speaking styles. VOCA also provides animator controls to alter speaking style, identity-dependent facial shape, and pose (i.e. head, jaw, and eyeball rotations) during animation. To our knowledge, VOCA is the only realistic 3D facial animation model that is readily applicable to unseen subjects without retargeting. This makes VOCA suitable for tasks like in-game video, virtual reality avatars, or any scenario in which the speaker, speech, or language is not known in advance. We make the dataset and model available for research purposes at http://voca.is.tue.mpg.de.Comment: To appear in CVPR 201

    Figure-ground modulation in awake primate thalamus

    Get PDF
    [Abstract] Figure-ground discrimination refers to the perception of an object, the figure, against a nondescript background. Neural mechanisms of figure-ground detection have been associated with feedback interactions between higher centers and primary visual cortex and have been held to index the effect of global analysis on local feature encoding. Here, in recordings from visual thalamus of alert primates, we demonstrate a robust enhancement of neuronal firing when the figure, as opposed to the ground, component of a motion-defined figure-ground stimulus is located over the receptive field. In this paradigm, visual stimulation of the receptive field and its near environs is identical across both conditions, suggesting the response enhancement reflects higher integrative mechanisms. It thus appears that cortical activity generating the higher-order percept of the figure is simultaneously reentered into the lowest level that is anatomically possible (the thalamus), so that the signature of the evolving representation of the figure is imprinted on the input driving it in an iterative process.Biotechnology and Biological Sciences Research Council (United Kingdom); G022305/1Medical Research Council, (United Kingdom); G070153

    Animation Synthesis Triggered by Vocal Mimics

    No full text
    International audienceWe propose a method leveraging the naturally time-related expressivity of our voice to control an animation composed of a set of short events. The user records itself mimicking onomatopoeia sounds such as "Tick", "Pop", or "Chhh" which are associated with specific animation events. The recorded soundtrack is automatically analyzed to extract every instant and types of sounds. We finally synthesize an animation where each event type and timing correspond with the soundtrack. In addition to being a natural way to control animation timing, we demonstrate that multiple stories can be efficiently generated by recording different voice sequences. Also, the use of more than one soundtrack allows us to control different characters with overlapping actions

    Nitric Oxide and Synaptic Dynamics in the Adult Brain: Physiopathological Aspects

    No full text
    corecore